This tutorial is written by Habibur Rahman, Graduate Research Assistant, School of Materials Engineering, Purdue University, West Lafayette, IN Email: msehabibur@purdue.edu
AI (Artificial Intelligence) refers to the development of systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and natural language understanding.
ML (Machine Learning) is a subset of AI that involves training a system to learn and improve from experience without being explicitly programmed. It uses algorithms to analyze and draw insights from large datasets, and these insights are then used to make predictions or decisions.
DL (Deep Learning) is a type of machine learning that uses neural networks with multiple layers to analyze and make sense of complex data. These neural networks are inspired by the structure and function of the human brain and are designed to automatically learn and improve by identifying patterns in large datasets. Deep Learning has been particularly successful in tasks such as image and speech recognition, natural language processing, and autonomous vehicles.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/1.png', width=500, height=300)
Supervised Learning: Supervised learning is a type of machine learning in which the model is trained on labeled data, meaning that the data is labeled with the correct output values. The goal of supervised learning is to learn a mapping function from inputs to outputs based on the labeled data. The model is then used to make predictions on new, unseen data. Examples of supervised learning algorithms include linear regression, logistic regression, decision trees, and neural networks.
Unsupervised Learning: Unsupervised learning is a type of machine learning in which the model is trained on unlabeled data, meaning that the data does not have any known output values. The goal of unsupervised learning is to discover patterns, relationships, and structures in the data. Examples of unsupervised learning algorithms include clustering, dimensionality reduction, and anomaly detection.
Reinforcement Learning: Reinforcement learning is a type of machine learning in which an agent learns to interact with an environment to achieve a goal. The agent receives feedback in the form of rewards or penalties based on its actions, and it uses this feedback to learn which actions lead to the highest reward. Reinforcement learning is commonly used in game playing and robotics, where the agent must learn to navigate its environment to achieve a specific task.
Transfer Learning: Transfer learning is a machine learning technique in which a model trained on one task is reused as a starting point for a new, related task. The idea is that the knowledge gained from the initial training can be transferred to the new task, reducing the amount of new data and training time required. Transfer learning is commonly used in natural language processing, computer vision, and speech recognition.
Regression Problem: In a regression problem, the goal is to predict a continuous output value based on input variables. The aim is to learn a function that maps the input variables to a continuous output value. For example, predicting the price of a house based on features such as square footage, number of bedrooms, and location is a regression problem. Some examples of regression algorithms include linear regression, decision trees, and neural networks.
Classification Problem: In a classification problem, the goal is to predict a categorical output value based on input variables. The aim is to learn a function that maps the input variables to a discrete output value. For example, predicting whether an email is spam or not spam based on its content is a classification problem. Some examples of classification algorithms include logistic regression, decision trees, and support vector machines.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/2.png', width=500, height=300)
Classification techniques are used when the target variable is categorical or discrete, and the goal is to classify data into different categories or classes. Some common classification techniques include:
(1) Logistic regression (2) Decision trees (3) Random forests (4) Support vector machines (SVM) (5) Naive Bayes (6) K-Nearest Neighbors (KNN)
Regression techniques, on the other hand, are used when the target variable is continuous, and the goal is to predict a numerical value. Some common regression techniques include:
(1) Linear regression (2) Polynomial regression (3) Ridge regression (4) Lasso regression (5) Elastic Net regression (6) Decision trees
Both classification and regression techniques can be used in supervised learning, where the algorithm is trained on labeled data to make predictions on new, unseen data. However, the specific technique chosen will depend on the nature of the problem and the type of data being analyzed.
Anaconda: Anaconda is a popular distribution of the Python and R programming languages used for data science and machine learning applications. The Anaconda distribution comes with pre-installed versions of Python, as well as a variety of popular libraries and tools such as NumPy, Pandas, Matplotlib, Jupyter Notebook, and Scikit-learn, among others. A Conda environment is a self-contained directory or folder that contains a specific version of Python, as well as any packages, dependencies, and libraries required for a specific project or application. Each Conda environment can have its own unique set of packages and dependencies, allowing for multiple projects or applications with different requirements to be installed and managed independently. Conda environments are created using the conda create command, which creates a new directory containing a clean installation of Python and the packages specified. The conda activate command can then be used to activate the environment, and any subsequent packages installed will only be installed in that specific environment, rather than globally. Conda environments are useful for several reasons, including: Avoiding conflicts: Different projects may require different versions of packages, which can lead to conflicts if installed globally. Conda environments allow you to install specific versions of packages for each project, avoiding conflicts between projects. Reproducibility: Conda environments allow you to create a snapshot of the package versions and dependencies required for a specific project, making it easy to reproduce the environment at a later time. Isolation: Conda environments are self-contained, meaning that any changes made to the environment will not affect other environments or the global Python installation.
Module: In Python, a module is a file containing Python definitions and statements. It can define functions, classes, and variables, and can also include runnable code. When you want to use the functionality provided by a module in your code, you can import it using the import keyword. Once you have imported a module, you can access its functions, classes, and variables by prefixing them with the module name. For example, the following code imports the built-in math module and uses its sqrt() function to calculate the square root of 2. Python comes with many built-in modules, and you can also install third-party modules using package managers like pip. You can also create your own modules by defining functions, classes, and variables in a Python file and importing it from another Python script.
import math
x = math.sqrt(2)
print(x)
1.4142135623730951
Method : A method is a function that is associated with an object of a class. It is similar to a function in that it performs a specific task, takes parameters as input, and can return a value as output. However, a method is defined within a class and can only be called on an instance of that class. Here, self is a parameter that refers to the instance of the class. When an instance method of a class is called, the self parameter is implicitly passed as the first argument to the method. It allows the method to access and modify the instance's attributes and call other instance methods.
class MyClass:
def my_method(self, x):
return x ** 2
def my_function(x):
return x ** 2
#In this code, my_method is a method of the MyClass class, and can only be called on an instance of that class, like this:
obj = MyClass()
result = obj.my_method(3)
print(result) # Output: 9
9
#On the other hand, my_function is a function defined outside of a class, and can be called like this:
result = my_function(3)
print(result) # Output: 9
9
Note: In scikit-learn (sklearn), linear_model is a module that contains classes for various linear regression models. The linear_model module itself is not a class, but rather a collection of classes and functions related to linear regression. For example, the LinearRegression class is part of the linear_model module and is used to perform ordinary least squares linear regression. Similarly, the Ridge, Lasso, and ElasticNet classes are also part of the linear_model module and are used to perform regularized linear regression. To use a class from the linear_model module, you need to import it into your Python code. For example, to use the LinearRegression class, you can import it as follows:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
**model = LinearRegression() creates a Linear Regression model object named model. LinearRegression() is a class provided by the scikit-learn library in Python that implements the linear regression algorithm. By calling LinearRegression(), you create an instance of this class, which can then be trained on a dataset using the fit() method and used to make predictions using the predict() method.
Linear regression is a type of regression analysis that models the relationship between a dependent variable (also known as the target variable) and one or more independent variables (also known as predictor variables) as a linear function. The goal of linear regression is to find the best linear relationship that describes how the independent variables are related to the dependent variable. The basic equation for linear regression is: y = b0 + b1x1 + b2x2 + ... + bn*xn where y is the dependent variable, x1, x2, ..., xn are the independent variables, b0 is the intercept, and b1, b2, ..., bn are the coefficients or weights that determine the slope of the linear relationship Linear regression can be used for both simple linear regression, where there is only one independent variable, and multiple linear regression, where there are multiple independent variables.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/5.png', width=400, height=300)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/6.png', width=300, height=300)
Let's now create a Linear regression model on single variable using built in sklearn libraries.
Note: Scikit-learn (also known as sklearn) is a popular machine learning library for Python. It provides a wide range of machine learning algorithms for classification, regression, clustering, and dimensionality reduction, as well as tools for model selection and evaluation, data preprocessing, and feature engineering. The library is built on top of NumPy, SciPy, and matplotlib, and it is designed to be easy to use and to integrate with other scientific computing libraries in Python. Some of the most commonly used machine learning algorithms in scikit-learn include linear regression, logistic regression, decision trees, random forests, support vector machines, k-nearest neighbors, and k-means clustering. Scikit-learn is open-source software, and it has a large and active community of users and contributors who continuously improve and expand the library with new features, algorithms, and tools.
# Importing required libraries
# A module is a collection of related functions and classes, while a class is a blueprint for creating objects that have a specific set of attributes and behaviors.
import numpy as np #Calling numpy module
import pandas as pd #Calling pandas module
from sklearn.linear_model import LinearRegression #Calling LinearRegression from sklearn.linear_model module
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Loading the dataset
data = pd.read_csv('/content/drive/MyDrive/DeepMSE/Introduction_to_ML/california_housing.csv')
# Splitting the dataset into training and testing sets
X = data['total_rooms'].values.reshape(-1,1) # Features
# In Pandas, the value method is a function that is used to return a Numpy array containing the values of a Pandas DataFrame or Series
# The function reshape(-1,1) is being used to reshape the 1-dimensional array of total_rooms values to a 2-dimensional array of shape (n_samples, 1), where n_samples is the number of rows in the original DataFrame.
# For example, if you try to use a 1-dimensional array as input to a scikit-learn machine learning model that expects a 2-dimensional input, you may get an error message like "ValueError: Expected 2D array, got 1D array instead".
y = data['median_house_value'].values # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
lr = LinearRegression()
# Training the model
lr.fit(X_train, y_train)
# Making predictions on the test set
y_pred = lr.predict(X_test)
# Evaluating the model's performance
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print('Root Mean Squared Error:', rmse)
<class 'numpy.ndarray'> Root Mean Squared Error: 110541.30153594677
Let's now create a Linear regression model on multiple variables using built in sklearn libraries.
Note: Random_state is an argument passed to the train_test_split() function. Train_test_split() is used to split the data into training and testing sets. By default, this function will split the data randomly every time it is called, which can lead to different results each time the code is run. Setting the random_state parameter to a fixed value (such as 42) ensures that the data will be split the same way every time the code is run. This can be useful for reproducibility, as well as for comparing the results of different models or algorithms.
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Loading the dataset
data = pd.read_csv('/content/drive/MyDrive/DeepMSE/Introduction_to_ML/california_housing.csv')
# Splitting the dataset into training and testing sets
X = data[['total_rooms', 'latitude', 'total_bedrooms']].values # Features
y = data['median_house_value'].values # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Creating a Linear Regression model
lr = LinearRegression()
# Training the model
lr.fit(X_train, y_train)
# Making predictions on the test set
y_pred = lr.predict(X_test)
# Evaluating the model's performance
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print('Root Mean Squared Error:', rmse)
Root Mean Squared Error: 106533.96498421146
Random forest is a popular ensemble learning method that combines multiple decision trees to improve the accuracy and stability of predictions. In a random forest, each decision tree is trained on a randomly selected subset of features and samples, which helps to reduce overfitting and improve generalization performance. The final prediction of the random forest is the average or majority vote of the predictions from all the individual trees. One of the main advantages of random forest is its ability to handle high-dimensional data with complex interactions between features. It can also handle missing data and outliers well, and is generally robust to noise in the data. Random forest is widely used in applications such as classification, regression, and feature selection. Bootstrapping and bagging are two important techniques used in random forest regressions. Here's an explanation of what they are and how they work: Bootstrapping: Bootstrapping is a resampling technique that involves creating multiple datasets by randomly sampling the original dataset with replacement. This means that each dataset is created by selecting observations from the original dataset, where each observation has an equal chance of being selected, and some observations may be selected multiple times. In the context of random forest regressions, bootstrapping is used to create multiple decision trees, each trained on a different subset of the original dataset. This helps to reduce overfitting and increase the accuracy of the model by introducing variability and diversity in the trees. Bagging: Bagging (short for bootstrap aggregating) is a technique that involves aggregating the predictions of multiple models trained on different subsets of the data. In the context of random forest regressions, bagging involves creating an ensemble of decision trees, each trained on a different bootstrap sample of the original dataset. The final prediction of the random forest is then calculated by averaging the predictions of all the decision trees. Bagging helps to reduce the variance of the model and improve its overall performance by reducing the impact of outliers and noise in the data. Together, bootstrapping and bagging work to create a robust and accurate random forest regression model that is less prone to overfitting and more resilient to noise and outliers in the data.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/14.png', width=700, height=400)
Classfication Examples:
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/8.png', width=1000, height=400)
Output hidden; open in https://colab.research.google.com to view.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/9.png', width=1000, height=400)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/10.png', width=1100, height=400)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/11.png', width=1100, height=400)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/3.png', width=600, height=300)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/4.png', width=800, height=400)
Let's now create a Random forest regression model for the previous house price prediction data using the built-in sklearn libraries.
# Importing required libraries
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Loading the dataset
data = pd.read_csv('/content/drive/MyDrive/DeepMSE/Introduction_to_ML/california_housing.csv')
# Splitting the dataset into training and testing sets
X = data.drop('median_house_value', axis=1) # Features
y = data['median_house_value'] # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Creating a Random Forest Regression model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
# Training the model
rf.fit(X_train, y_train)
# Making predictions on the test set
y_pred = rf.predict(X_test)
# Evaluating the model's performance
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print('Root Mean Squared Error:', rmse)
Root Mean Squared Error: 61034.38587582522
Note: n_estimators is a hyperparameter of the RandomForestRegressor model. It represents the number of decision trees that will be grown in the random forest. In the given example, n_estimators=100 means that the RandomForestRegressor will be built using 100 decision trees. Having more trees in a random forest generally leads to better model performance, but also increases the computation time and memory requirements. In practice, the optimal number of trees depends on the specific problem and dataset, and can be determined through hyperparameter tuning
import seaborn as sns
importances = pd.Series(rf.feature_importances_, index=X.columns)
sns.heatmap(importances.to_frame(), annot=True, cmap='YlGnBu')
<AxesSubplot:>
An optimized model in machine learning is a model that has been tuned to achieve the best possible performance on a given task or problem. Optimization involves adjusting the hyperparameters of the model and selecting the best set of hyperparameters that minimize the error on the training set and generalize well to new data. The process of optimizing a model typically involves trying different combinations of hyperparameters and evaluating the performance of the model using a validation set or cross-validation. The goal is to find the set of hyperparameters that produce the best performance on the validation set.Once a model has been optimized, it can be used to make predictions on new data. However, it is important to keep in mind that the performance of the model on new data may not be as good as its performance on the validation set, especially if the model has overfit to the training data. Therefore, it is important to monitor the performance of the model on new data and re-evaluate the hyperparameters if necessary.
Hyperparameters are settings that are not learned from the data but are set before the training process begins. Examples of hyperparameters include the number of hidden layers in a neural network, the number of trees in a random forest, the learning rate in a gradient descent algorithm, and the regularization strength in a linear regression model.
In machine learning, a loss function is a measure of how well a machine learning model is able to predict the outcome of a given task. The loss function calculates the difference between the predicted values of the model and the true values of the task, and this difference is used to update the model's parameters during training. In regression tasks, where the goal is to predict a continuous value, a common loss function is mean squared error (MSE). This loss function calculates the average squared difference between the predicted values of the model and the true values of the task, and it penalizes the model more for larger errors. The choice of loss function can have a significant impact on the performance of the model, and it is often a key aspect of the model selection and optimization process.
Gradient descent is an optimization algorithm commonly used in machine learning and deep learning to train models by minimizing the loss function. The goal of the algorithm is to find the set of parameters of the model that result in the minimum value of the loss function. The algorithm works by iteratively updating the parameters of the model in the opposite direction of the gradient of the loss function with respect to the parameters. In other words, the algorithm takes steps in the direction that decreases the value of the loss function. There are three main types of gradient descent: batch, stochastic, and mini-batch.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/15.png', width=500, height=300)
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/7.png', width=500, height=400)
Learning rate is a hyperparameter in many optimization algorithms used in machine learning, including gradient descent, that determines the step size at each iteration when updating the parameters of the model. The learning rate controls how much the parameters are adjusted in the direction of the negative gradient of the loss function. If the learning rate is too large, the optimization algorithm may overshoot the minimum of the loss function and diverge, or it may oscillate around the minimum without converging. If the learning rate is too small, the optimization algorithm may take a long time to converge to the minimum of the loss function, or it may get stuck in a local minimum. A good learning rate is one that allows the optimization algorithm to converge to the minimum of the loss function quickly and efficiently. Choosing the right learning rate often requires experimentation and hyperparameter tuning. In practice, there are several methods to adjust the learning rate during training, such as learning rate schedules, momentum, and adaptive learning rate methods like AdaGrad, RMSProp, and Adam. These methods aim to dynamically adjust the learning rate based on the progress of the optimization algorithm to achieve better performance and convergence.
In machine learning, it is common to divide a dataset into three subsets: the training set, the validation set, and the test set. The training set is used to train the machine learning model, which involves adjusting the model's parameters using an optimization algorithm to minimize the loss function on the training data. The validation set is a subset of the dataset that is used to evaluate the performance of a machine learning model during training and to tune the hyperparameters of the model. During training, the machine learning model is iteratively updated using an optimization algorithm to minimize the loss function on the training data. However, the performance of the model on the training data may not necessarily reflect its performance on new, unseen data. This is because the model may overfit to the training data, meaning that it memorizes the training data and fails to generalize to new data. To prevent overfitting and to evaluate the performance of the model on new, unseen data, the validation set is used. The validation set is typically a subset of the dataset that is not used for training but is used to evaluate the model's performance during training. The model's performance on the validation set is used to determine the best hyperparameters, such as the learning rate, regularization strength, or the number of hidden layers, that optimize the model's performance. The test set is used to evaluate the final performance of the trained model on new, unseen data. The test set should be completely separate from the training and validation sets and should not be used during the training or hyperparameter tuning process.
7. Cross validation :
5-fold cross-validation is a technique used in machine learning to evaluate the performance of a model. It involves splitting the data into 5 equal-sized subsets, or "folds", where each fold is used once as a validation set and the remaining 4 folds are used as training data. The 5-fold cross-validation process can be broken down into the following steps: The data is randomly split into 5 folds of equal size. The model is trained on 4 of the folds (i.e., the training data) and evaluated on the remaining fold (i.e., the validation data). Steps 2 is repeated 5 times, with each fold being used as the validation set once. The performance of the model is averaged across the 5 validation sets. The advantage of using 5-fold cross-validation is that it allows for a more robust and reliable estimate of the model's performance compared to a single train-test split. This is because the model is evaluated on 5 different subsets of the data, which helps to reduce the impact of random sampling variations and provides a better estimate of how well the model will generalize to new data. Overall, 5-fold cross-validation is a useful technique for evaluating the performance of machine learning models and can help to ensure that the model is robust and reliable across different subsets of the data.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/12.png', width=600, height=400)
8. Overfitting, Underfitting and Balanced:
Overfitting and underfitting are two common problems in machine learning where the model either performs poorly on new data or performs too well on training data but poorly on new data. Overfitting occurs when the model is too complex and has learned the noise or random fluctuations in the training data instead of the underlying pattern. As a result, the model fits the training data very well, but it fails to generalize well to new data. Overfitting can occur due to a variety of reasons, such as having too many features, using a complex model with too many parameters, or insufficient training data. Underfitting, on the other hand, occurs when the model is too simple and fails to capture the underlying pattern in the data. This often happens when the model is not complex enough to capture the complexity of the data. As a result, the model has high bias and low variance, and it performs poorly on both the training and test data. The main reasons for underfitting include using a model with too few features or parameters, using a linear model to fit non-linear data, or insufficient training data. To avoid overfitting, techniques such as regularization, early stopping, and cross-validation can be used. To address underfitting, techniques such as increasing the complexity of the model or adding more features can be employed.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/13.png', width=800, height=300)
9. Descriptors:
In the context of materials science, descriptors refer to a set of quantitative parameters that describe the properties and characteristics of a material. These descriptors can be used to predict the performance and behavior of materials under different conditions and to design new materials with desired properties. Descriptors in materials science can include physical properties such as melting point, boiling point, and density, as well as chemical properties such as electronegativity, ionization energy, and oxidation state. Other descriptors might include crystal structure, surface area, porosity, and thermal conductivity. Machine learning techniques can be used to analyze and model the relationships between these descriptors and material properties, enabling researchers to predict the properties of new materials and to design materials with specific properties for a wide range of applications, from energy storage and conversion to catalysis, electronics, and biomaterials.
from IPython.display import Image
Image(filename='/content/drive/MyDrive/DeepMSE/Introduction_to_ML/16.png', width=500, height=200)
10. Kernal in Jupyter Notebook:
In Jupyter Notebook, a kernel is a program that runs in the background and provides computational support for a specific programming language. The kernel receives code in the form of cells, executes that code, and sends the output back to the notebook interface, where it can be displayed as text, plots, or other visualizations. Jupyter Notebook supports kernels for a wide variety of programming languages, including Python, R, Julia, and many others. Each kernel has its own set of installed packages, configuration options, and runtime environment, which allows users to work with multiple languages and versions on the same machine. When you open a Jupyter Notebook, you can choose which kernel to use for that notebook by selecting the appropriate option from the "New" dropdown menu or by changing the kernel in the "Kernel" dropdown menu at the top of the notebook interface. Once you have selected a kernel, all code cells in that notebook will use that kernel for execution until you switch to a different kernel.
Acknowledgement
%%shell
jupyter nbconvert --to html ////content/drive/MyDrive/Colab Notebooks/Introduction_to_ML.ipynb
[NbConvertApp] WARNING | pattern '////content/drive/MyDrive/Colab' matched no files
[NbConvertApp] WARNING | pattern 'Notebooks/Introduction_to_ML.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
overwrite base name use for output files.
can only be used when converting one notebook at a time.
Default: ''
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) <ipython-input-4-5c3cc0275334> in <module> ----> 1 get_ipython().run_cell_magic('shell', '', 'jupyter nbconvert --to html ////content/drive/MyDrive/Colab Notebooks/Introduction_to_ML.ipynb\n') /usr/local/lib/python3.9/dist-packages/google/colab/_shell.py in run_cell_magic(self, magic_name, line, cell) 331 if line and not cell: 332 cell = ' ' --> 333 return super().run_cell_magic(magic_name, line, cell) 334 335 /usr/local/lib/python3.9/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell) 2357 with self.builtin_trap: 2358 args = (magic_arg_s, cell) -> 2359 result = fn(*args, **kwargs) 2360 return result 2361 /usr/local/lib/python3.9/dist-packages/google/colab/_system_commands.py in _shell_cell_magic(args, cmd) 110 result = _run_command(cmd, clear_streamed_output=False) 111 if not parsed_args.ignore_errors: --> 112 result.check_returncode() 113 return result 114 /usr/local/lib/python3.9/dist-packages/google/colab/_system_commands.py in check_returncode(self) 135 def check_returncode(self): 136 if self.returncode: --> 137 raise subprocess.CalledProcessError( 138 returncode=self.returncode, cmd=self.args, output=self.output 139 ) CalledProcessError: Command 'jupyter nbconvert --to html ////content/drive/MyDrive/Colab Notebooks/Introduction_to_ML.ipynb ' returned non-zero exit status 255.